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Abstract —Planning under uncertainty is a key requirement 
for physical systems due to the noisy nature of actuators and 
sensors. Using a belief space approach, planning solutions tend 
to generate actions that result in information seeking behavior 
which reduce state uncertainty. While recent work has dealt 
with planning for Gaussian beliefs, for many cases, a multi¬ 
modal belief is a more accurate representation of the underlying 
belief. This is particularly true in environments with information 
symmetry that cause uncertain data associations which naturally 
lead to a multi-modal hypothesis on the state. Thus, a planner 
cannot simply base actions on the most-likely state. We propose 
an algorithm that uses a Receding Horizon Planning approach 
to plan actions that sequentially disambiguate the multi-modal 
belief to a uni-modal Gaussian and achieve tight localization on 
the true state, called a Multi-Modal Motion Planner (M3P). By 
combining a Gaussian sampling-based belief space planner with 
M3P, and introducing a switching behavior in the planner and 
belief representation, we present a holistic end-to-end solution for 
the belief space planning problem. Simulation results for a 2D 
ground robot navigation problem are presented that demonstrate 
our method’s performance. 

1. Introduction 

Motion planning for robotics involves dealing with the 
uncertain nature of physical systems i.e. noisy actuators and 
sensors as well as changes in the environment in which the 
robot operates. The motion or actuator uncertainty makes it 
difficult to execute precise actions and sensing uncertainty 
makes it impossible to determine the exact state of the 
robot. Further, changes in the environment can reduce the 
effectiveness of plans computed offline. Thus, unless a plan 
can be updated on the fly to account for new constraints, 
the plan might fail. A significant amount of research has 
gone into developing probabilistic methods to achieve robust 
performance for practical systems. In the probabilistic 
approach, the aim is to develop methods that maximize 
the probability of achieving a desired state. State of the art 
methods rely on a probability distribution over the system’s 
state (called the belief) and develop solutions in the belief 
space that enable us to drive the system belief from an initial 
to a desired belief. In the case of a mobile robot, we may wish 
to drive the system from a start location to a goal location, 
or in the case of a manipulator, manipulate an object from 
an initial to some final state. In general, planning for systems 
under uncertainty belongs to the class of Partially-Observable 
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Markov Decision Process (POMDP) problems which are 
known to be computationally intractable. 

Recent work, in particular, sampling based methods, have 
shown varying degrees of success in solving the POMDP 
problem. In [14], the authors construct a belief space variant 
of a Probabilistic RoadMap (PRM) [7] to generate plans that 
result in information gathering actions which minimize state 
uncertainty at the goal. The authors use linearized process and 
measurement models, and a Gaussian belief parametrization. 
In [4], a graph is constructed in belief space and pruned 
successively, ultimately resulting in a tree in belief space 
which guarantees convergence to the optimal solution in the 
limit of infinite samples. These methods along with others 
[5], [16], [8] provide solutions that are dependent on the 
initial belief. This makes them computationally inefficient for 
real-time replanning. Feedback-based Information RoadMap 
(FIRM) [3] also builds a belief space variant of a PRM, but 
this work differs from the previous ones in that it enables 
re-use of the offline computations to compute new policies 
online. FIRM introduces belief stabilizers at the graph nodes 
which act as funnels and lead to edge independence in the 
FIRM graph. We note that the different instantiations of this 
method also assume a Gaussian belief representation. This 
method is shown to work well in practice in [2] on a mobile 
robot operating indoors. 



Fig. 1: A kidnapped robot scenario in a world with 4 identical 
rooms. The actual robot is depicted by the blue outlined 
disk, whereas the hypothesis are depicted by red disks. All 
hypothesis are equally likely. The dashed green arrows show 
a possible control action based on the hypothesis in the top- 
right room that can result in collision for the actual robot. 





















A. Issues with Gaussian Belief Representation 

In [2] situations such as kidnapping (lost-robot), were 
dealt with by expanding the error covariance such that the 
Bayesian estimator (in this case a Kalman filter) is able to 
account for the unexpected data associations/observations or 
large innovations. This works well as long as there are no 
uncertain data associations. The robot was able to enter into 
an Information Gathering Mode and make small cautious 
movements to gain enough information until the error 
covariance converged to a small threshold. Certain situations 
can arise where data association between what is observed and 
the robot’s a-priori map leads to a multi-modal hypothesis. 
An example is the data association problem for a kidnapped 
robot operating in a symmetric environment. Imagine a 
mobile robot equipped with a laser scanner operating in a 
world where there are identical rooms as shown in Fig. 1. 
To the laser, each and every room appears identical. Thus, if 
the robot is switched off and placed randomly in one of the 
rooms, on waking up it has no way of knowing exactly where 
it is. In a sense, based on the observations, the robot would 
think it could be in either of the four rooms at that instant 
since the laser readings would look alike. We can extend 
the previous statement to say that when sensory information 
leads to uncertain data associations, the robot may believe 
itself to be in one of multiple states at the same time. Such a 
situation implies that the pdf of the robot’s belief cannot be 
represented by a uni-modal Gaussian. 

In [12], the authors investigate the grasping problem with a 
multi-modal hypothesis of the gripper’s state. Their method 
picks the most-likely hypothesis and a fixed number of 
samples from the belief distribution, then using a Receding 
Horizon Control approach, belief space trajectories are found 
using direct transcription that maximize the observation gap 
between the most-likely hypothesis and the drawn samples. 
The authors use this approach to prove/disprove the most 
likely hypothesis. In [13], the correctness and complexity of 
the algorithm presented in [12] is analyzed and proven to 
eventually converge to the goal region in belief space with 
as low as two samples. In [11], the authors build upon [12] 
wherein they transpose the non-convex trajectory planning 
problem in belief space to a convex problem that minimizes 
the average log-weight of the belief-modes and a quadratic 
action cost. 

Fundamentally, our work and the work in [11, 12] aim to 
achieve the same goal of proving/disproving hypothesis of the 
robot’s state. At the very basic level, these methods choose 
actions that lead to information gathering behavior such that 
different modes of the robot’s belief are expected to observe 
different information, thus disambiguating the multi-modal 
hypothesis. In our method, we develop an information graph 
node based receding horizon planner that sequentially disam¬ 
biguates the different hypothesis encoded in the multi-modal 
belief state. Further, the technique developed is guaranteed 
to drive the belief state into a unimodal belief state from 


where a Gaussian planner such as FIRM can take over to 
complete the robot’s mission. We believe that our method 
is computationally more efficient than the optimization based 
technique in [11, 12]. It is also able to deal with the kidnapped 
robot scenario which may not be possible to address using 
the trajectory optimization based technique in [11, 12] due 
to the difficulty of generating an initial feasible plan for the 
widely separated modes in the presence of obstacles. We 
further believe that our planner is applicable to any situation 
wherein a multi-modal belief may arise in the robot’s state due 
to uncertain data associations. We also show how the multi 
modal planner can be seamlessly combined with a unimodal 
belief space planner such as FIRM to facilitate belief space 
planning for robotic tasks where Gaussian/ unimodal belief 
representations may not be valid during the entirety of the 
task. 

Contributions’. The key contributions of our work are as 
follows: 

1) We represent the belief with a Gaussian Mixture Model 
(GMM) rather than particles. 

2) Instead of basing actions on the most-likely hypothesis, 
we create candidate actions based on each mode and 
evaluate the best one. 

3) We use a sampling based planner i.e. RRT* [6] to plan 
candidate trajectories. (One can also simply use RRTs 
[10] but due to insignificant overhead in using RRT* 
over RRT we prefer RRT* as it gives us the benefit of 
optimality) 

4) We introduce a switching behavior in the belief repre¬ 
sentation during the online-phase from Gaussian to non- 
Gaussian, and back, as required. Our argument is that 
most of the times, the belief is well represented by a 
Gaussian, wherever this is not the case, we switch to 
a GMM and our algorithm creates plans that converge 
back to a uni-modal Gaussian. 

5) We present simulation results for a 2D navigation prob¬ 
lem in which a robot is kidnapped. 

II. Problem statement 

Let Xk, Uk, and Zk represent the system state, control 
input, and observation at time step k respectively. Let 
X, U, and Z denote the state, control, and observation 
spaces respectively. It should be noted that in our work, 
the state Xk refers to the state of the mobile robot i.e. 
we do not model the environment and obstacles in it 
as part of the state. The sequence of observations and 
actions are represented as Zi-j = • • • , and 

Ui:j = {ui^Ui-^i,' - ^Uj} respectively. The non-linear state 
evolution model / and measurement model h are denoted 
as Xk+i = f{xk,Uk,Wk) and Zk = h{xk,Vk), where 
Wk ^ A/’(0, Qk) and Vk ^ A/’(0, Rk) are zero-mean Gaussian 
process and measurement noise, respectively. 

We assume that the robot is initially tasked to go from a 
start location to a goal location. For this purpose, we use a 
Gaussian Belief Space Planner such as FIRM. However, during 



the course of the task, say at time k, the robot is kidnapped, or 
gets lost, leading to uncertain data associations for its sensor 
readings. In such a case, the belief can be represented 
by a GMM at time /c as a weighted linear summation over 
Gaussian densities. Let Wi^k, and be the importance 
weight, mean vector and covariance matrix associated to the 
Gaussian rrii respectively at time k, then 

N 

^ ^ '^i,k'^i,k-) '^i,k ^ {k'i,k-) ( 1 ) 

i=l 

Our goal is to construct a belief space planner p{bk) 
such that under the belief space planner, given any initial 
multi-modal belief bo, the belief state process evolves such 
that br = mr, where tut = St) for some finite time 

T. 


In other words, our goal is to construct a belief space 
planner such that it is guaranteed to drive the initial multi¬ 
modal belief into a unimodal belief in finite time. The basic 
idea is that once such a unimodal belief is achieved, a 
Gaussian belief space planner such as FIRM can take over 
again and guide the robot towards its mission goal which was 
interrupted due to the kidnapping. 

Note that we do not require optimality from our planner, 
only that it stabilize the belief state process to a unimodal 
belief in a guaranteed fashion. Further, albeit the scenario 
used to motivate the problem is the kidnapped robot situation, 
the method proposed is general, and can be extended to any 
planning situation where a multi-modal belief arises in the 
robot state due to uncertain data associations. 

III. Methodology 

One cannot simply base actions on the mean, as in a multi¬ 
modal scenario, it would not make sense in the physical world. 
Again, taking the example shown in Fig. 1, the mean of all 
the modes may lie within some obstacle which obviously is 
not a good choice to base actions on. Our algorithm creates 
candidate plans for each belief mode that guide it to a state 
called the target state such that the targets of all the modes have 
minimal information overlap. For each policy, we simulate the 
expected information gain which we define as the reduction in 
the discrete number of modes and choose the gain maximizing 
policy. 

A. Belief Propagation Using Gaussian Mixture Model 

We adopted the Gaussian-Mixture-Model (GMM) for belief 
representation because it provides a seamless transition from 
multiple to single hypothesis scenarios and vice versa during 
localization and planning. We found in our experiments that 
the particle filter alternative [9] which is generally used for 
global localization in a kidnapped situation is slower in con¬ 
vergence and less robust compared to the approach presented 
here. Our method also incurs less computational cost compared 
to PF-Localization. Once the lost state is detected, we need to 


generate a rich set of samples from which we can converge to 
the most likely modes. We uniformly sample the configuration 
space and set these samples as the means p^i^k of the modes 
of the Gaussian and assign equal covariance to each mode. 
Algorithm 1 shows how we generate the initial multi-modal 
belief. Note that Uk, the number of Gaussian-modes at time 
k, can vary depending on the observational sequences as new 
hypotheses are added and some die out. Further, we keep the 
weights normalized such that Ylf=i'^i,k = 1. As the robot 
moves and gets observations, the importance weights Wi^s 
are updated based on the measurement likelihood function as 
shown in Eq. 2. 


^ -0.5T>2 

^i,kO- 
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where Di^k is the Mahalanobis distance between the sensor 
observation and most-likely observation for mode such that 


Df^ = {h{xk,Vk) - h{iJLi,0)yRP{h{xk,Vk) - ft.(/u,,0)). 

( 3 ) 

Depending on the state of the robot, individual hypotheses 
and data association results, we might have several cases. 

1) The number of real landmarks observed could be 
equal, less or more than the number of predicted obser¬ 
vations n/i. where hi is the predicted observation vector 
for the i'th mode. 

2) The number of associated real and predicted observa¬ 
tions riznh could be equal or less than rih or both. 

Thus, to update the weight of each hypothesis, we factor 
in the above information. First we estimate the Mahalanobis 
distance Di^k between the predicted and observed landmarks 
that are matched by the data association module. Then we 
multiply this weight by a factor 7, which models the effect 
of duration for which the robot and a hypothesis predict to 
see different information. The entire weight update algorithm 
is described in algorithm 2. After each weight update step, 
we remove modes with negligible contribution to the belief 
i.e. when Wi^k-\-i < where 6^ is a user defined parameter 
for minimum weight threshold (1% of the total weight in our 
simulations). 

B. Multi-Modal Motion Planner (MSP) 

The task of the non-Gaussian planner is to generate 
a sequence of control actions such that the belief state 
converges to a uni-modal Gaussian. We call our planner M3P 
i.e. Multi-Modal Motion Planner. Algorithm 3 describes the 
planner’s key steps and Fig. 2 describes the basic functionality 
of the planner in a sample scenario. 

The key steps in the MSP planner are steps 3 and 4 
in Algorithm 3 that generate the candidate policies for the 
different modes, and then choose one of the policies based 
on the expected information gain. In the following section, 
we give an uniqueness graph (node) based instantiation of 




(a) A candidate policy is gen¬ 
erated for each mode and we 
pick the policy with maximum 
expected gain. 



(b) Policy 3 is chosen and exe¬ 
cuted, leading all the hypothesis 
out of the different rooms. Mode 
7714 expects to see a landmark 
outside the door which the robot 
does not see. 



(c) Mode 7774 is rejected and a 
new set of candidate policies is 
computed. 


Fig. 2: Key steps in the Receding Horizon Control methodology for a multi-modal belief. 


Algorithm 1: Generating Multi-Modal Belief State 

1 Input: So (Initial Covariance) 

2 {/i} = Sample valid states uniformly; 

3 for iii G {/i} do 

4 rrii = Compose belief mode with mean and 
covariance Sq ; 

5 Add mode rrii to belief b ; 

6 Assign uniform weight to all modes ; 

7 while not converged to fixed number of modes do 

8 Update mode weights in belief 6; 

9 Remove modes with weights lower than threshold ; 

10 return b; 


Algorithm 3: MSP: Multi-Modal Motion Planner 

1 Input: b (Belief) 

2 while b 7 ^ S) do 

3 n = Generate candidate policy for each belief mode ; 

4 TT = Pick policy from H with maximum expected 
information gain ; 

5 forall the U G TT do 

6 b = Apply action u and update belief; 

7 if Change in number of modes \ \ Expect a belief 
mode to violate constraints then 

8 ^ break; 

9 return b; 


Algorithm 2: GMM Weight Update 


1 Input: Wi^k,hi,k+i 

2 Output: Wi^k+C 

3 z/c+i = Get sensor observations ; 

4 Uz = Number of landmarks observed in Zk^i\ 

5 'Tiznh = Data association between and Zk+i ; 

6 w- = Update and normalize weight according to 
likelihood function ; 

7 if rih 7 ^ || then 

8 a = max{l+nz-n^nh,f + nh-n^nh)', 

9 j3 = /3 -h St ; 

[0 j = ^-O.OOOla/3 . 


11 else 

12 

13 

14 


/3 = /3-1.0 ; 

ifj3<0 then 

L /3 = 0; 


15 Wi^k-hi = ^Ik+i'y ’ 

16 return Wi^k+i, 


the planner which allows us to recover from kidnapped robot 


scenarios. 

C. Node based Multi-Modal Motion Planner (NBM3P) 

Here we give a particular instantiation of the two key 
steps required to enable the MSP planner; 1) generating the 
candidate policies and, 2) picking the optimal policy. 

1) Generating a Set of Candidate Policies: In a multi¬ 
modal scenario no-one action based on a belief mode can 
be said to be the best action. The best action for one 
mode may or not be good for the other modes. Thus, in a 
multi-modal scenario, we claim that the best action to take 
is one that guides the robot without collision through a path 
that results in information gain such that the belief converges 
to a uni-modal Gaussian pdf. We reduce the infinite set of 
possible control actions to a discrete number of policies by 
picking a target state for each mode to go to and generate 
control actions that guide each mode to its corresponding 
target. 

Picking the target state for a mode: We introduce the 
concept of a uniqueness graph Gu (computed offline) that 
allows us to pick target states for each belief mode. The 
uniqueness graph Gu is constructed using information from 






















the FIRM graph. Each FIRM node is added as a node 
in Gu (only x^y^O). Once a node is added, we calculate 
what information the state represented by that node can 
observe in the environment. Using this information we add 
an edge E^is (undirected) between two nodes Va and if 
both nodes see similar information. In our landmark based 
observation model, each landmark has an ID. Thus if both 
nodes observe the same ID or IDs then an edge is added 
and the weight of the edge is equal to the number of similar 
landmarks observed. The edge weight gives an indication of 
the similarity (or conversely uniqueness) in the information 
observed. If the weight of an edge is higher, it means the 
states represented by the vertices of that edge are more likely 
to observe similar information. To find the target for a belief 
mode rrii, we first choose the set of nodes Ni which belong 
to the neighborhood of jii (which is the mean of mode rrii). 
Then, we find the node G Ni which observes information 
that is least similar in appearance to that observed by nodes 
in the neighborhood Nj of mode rrij where j ^ i. We are 
trying to solve the optimization problem. 


{v{,vl .. .vl^) = argmin {h{vi, 0 )r]h{v 2 , 0 )r ].. .r]h{vn,0)). 

Vi,V2,---,Vn 

(4) 

To do this, first we calculate the total weight of the outgoing 
edges from every node v e Ni to nodes in all other neigh¬ 
borhoods Nj where j ^ i. The node which has the smallest 
outgoing edge weight, is the target candidate. Algorithm 4 
describes in detail the steps involved. 

Algorithm 4: Finding the target for a mode in a multi- 
modal belief _ 

1 Input: rrii^k ^ Gu \ H belief mode, uniqueness graph 

2 Output: v\\ I I target node 

3 forall the mi^k ^ do 

4 Ni = Find nodes in Gu within neighborhood of 
radius R centered at ; 

5 minWeight = Arbitrarily large value; 

6 vl = -1 ; 

7 forall the v G Neighborhood Ni of mode rrii^k do 

8 for Nj G set of all neighborhoods of the belief modes 
and j 7 ^ i do 

9 re = 0 ; 

10 forall the e G Edges connected to v do 

11 I forall the p G Nj do 

if p is a target of edge e then 
^ w-\- = edgeWeight(e); 

14 if w < minWeight then 

15 minW eight = w\ 

16 v\ = V \ 

17 return v\\ 


Connecting a mode to its target: Once we have picked 
the targets corresponding to each mode, we need to find the 
control action that can take the mode from its current state to 
the target state. We generate the open loop control sequence 
that takes each mode to its target using the RRT* planner. 
RRT* is chosen because it is computationally cheap and can 
incorporate the system’s kinodynamical constraints. 

2) Picking the Optimal Policy: Once we have generated 
the set of candidate policies. We need to evaluate the expected 
information gain from each policy and pick the optimal policy 
that maximizes this information gain. We model this informa¬ 
tion gain as the discrete change in the number of modes. This 
implies that instead of doing Monte Carlo simulations with 
noisy observations and motion to generate all possible belief 
trajectories, we can simulate the most-likely belief trajectory. 
This helps reduce the computational burden significantly. We 
know that one policy may or may not be good for all the 
modes i.e. a policy based on one mode may lead to collision 
for the other modes. Therefore, we need a way of penalizing 
a candidate policy if it results in collision. We introduce a 
penalty Cfaulk where Cfau is a fixed value (10^) and k is the 
step during execution at which the collision takes place. Thus, 
policies which result in a collision much further down are 
penalized less compared to policies that result in immediate 
collision. The steps to calculate the expected information gain 
for a candidate policy are: 

1) Pick a candidate policy tt^ corresponding to mode pi 

2) Assume that the robot is actually at one of the modes. 

3) Execute the policy and propagate all the modes and 
measure the information gain (the reduction in the 
number of modes). 

4) Repeat steps 2-3 for each mode and sum the weighted 
information gain weighted by the mode weights. 

We repeat the above steps for all policies in the set If and 
pick the policy with maximum weighted information gain. 

Algorithm 5 shows how we generate the optimal policy tt. 
Two key functions to note here are: 

1) findTargetState(h\) : Finds a FIRM node in the neigh¬ 
borhood of mode h\ at time k such that by going to this 
node, the robot can gain the maximum possible infor¬ 
mation. Algorithm 4 shows how this function works. 

2) calculateInformationGain('Ki^h\) : Given a control se¬ 
quence TTi and mode it calculates the expected 
information gain. Algorithm 6 shows how the expected 
information gain is calculated for the candidate policies. 

D. Analysis 

In this section, we show that the basic receding horizon 
planner M3P will guarantee that an initial multi-modal belief 
is driven into a unimodal belief in finite time. First, we make 
the following assumptions: 

Assumption 1: Given a multi-modal belief 
bk = for every mode mi^k, there exists a 

disambiguating planner pi{.) in the sense that if the robot 
was really in mode i, the planner’s actions would confirm 
that the robot was at mode i. 




Algorithm 5: Calculating the Optimal Policy k 

1 forall the rrii^k ^ Mk do 

2 target state for mode {rrii^k) ; 

3 TT = Find open loop controls from jii^k to v\^ using 
RRT* ; 

4 n.append(7r); 

5 forall the G 11 do 

6 forall the rrii^k ^ Mk do 

7 Alij = Calculate expected information gain for 
policy TTj assuming robot is at 

n 

8 ’ 

i=i 

9 TT* = Pick policy from 11 that has maximum gain 51 ; 

10 return tt*; 


Algorithm 6: Calculating Expected Information Gain 


1 Input: iTj^rrii^k 

2 Output: Alij 

3 X{) = IJji^k 1 

4 riQ = Current number of belief modes; 

5 Alij = 0 ; 

6 forall the Uk G tt^ do 


7 

8 
9 

10 

11 

12 


Xk+1 = f{xk,Uk,0) ; 

^k-\-l ~ 9 

Propagate individual Kalman filters with control Uk 

and most likely observation Zk^i ; 

if Any belief mode is in collision configuration then 

break; 


robot was at mode and the belief collapses into a unimodal 
belief thereby proving the result. In case 2, due to Assumption 
2, we know that the robot could not have started at mode 
and thus, the number of modes is reduced by at least one. 
After this disambiguation, we restart the process as before 
and we are assured that atleast one of the modes is going 
to be disambiguated and so on. Thus, it follows given that 
we had a finite number of modes to start with, the belief 
eventually converges to a unimodal belief. Further, since each 
of the disambiguation epochs takes finite time, a finite number 
of such epochs also takes a finite time, thereby proving the 
result. 

■ 

The above result shows that the basic MSP algorithm will 
stabilize the belief process to a unimodal belief under Assump¬ 
tions 1 and 2. In order to show that a particular instantiation 
of the planner, such as the target node based planner NBM3P 
detailed above, stabilizes the belief to a single mode, we need 
to show that the particular planner satisfies Assumption 1. It 
can be seen that under NBM3P, by design, the robot reaches 
a unique disambiguating location if it started in a particular 
mode, and thus. Assumption 1 is satisfied. Hence, this leads 
to the following corollary. 

Corollary 1: The Node based receding horizon planner, 
NBM3P, drives any initial multi-modal belief into a unimodal 
belief in finite time, under Assumption 2. 

Remark 1: The above results hold if we choose the disam¬ 
biguating policy at random, and not as the optimal one in the 
sense of the number of modes disambiguated. However, in 
practice we see that the optimal disambiguating policy gives 
us better performance. 


13 riT = Number of belief modes after simulation; 

14 Alij = A/ij -Cut — riQ ; 

15 return AIij\ 


Assumption 2: The map does not change during the execu¬ 
tion of the planner. 

Proposition 1: Under Assumptions 1 and 2, given any 
initial multi-modal belief 6o = the receding 

horizon planner M3P drives the belief process into a unimodal 
belief br = tut ~ St) in some finite time T. 

Proof: Suppose that the robot is at the initial belief bo. 
Suppose we choose the plan pi* that results in the most 
information gain as required by the M3P planner. The plan 
Pi* can be applied to all the modes at least for some finite 
period of time in the future, since if it cannot be applied, then 
we immediately know that the robot is not at mode and 
thus, there is a disambiguation whereby mode is discarded. 
Once the plan pi* is executed, there are only 2 possibilities: 
1) The robot is able to execute the entire plan pi* till the 
end, or 2) the plan becomes infeasible at some point of its 
execution. 

In case 1, due to Assumption 1, we will know for sure that the 


IV. Simulation Results 

We present simulation results for a 2D robot. The simula¬ 
tions represent a motion planning scenario wherein the robot is 
tasked to go from a start to a goal location in an environment 
where there is symmetry. We initially rely on FIRM [3] to 
provide a feedback policy. However, en-route to the goal, 
the robot is kidnapped to an unknown location and it cannot 
localize. Thus, it relies on the non-Gaussian planner NBM3P 
described previously to localize. Once the robot is localized, 
its new belief is connected to the existing FIRM graph and we 
find a new feedback policy to complete the task. We abstract 
the geometry/appearance of the environment by using passive 
visual beacons that have IDs associated to them. We place 
multiple landmarks with the same IDs in different locations, 
thus creating the ‘illusion’ of multiple-data associations. All 
simulations were carried out on a Macbook Pro laptop with an 
Intel Core-i5 2.6GHz CPU and 8GB of RAM running Ubuntu 
14.04, we use the Open Motion Planning Library [15] as the 
software back-end. 

A supplementary video is provided [1] that clearly depicts 
every stage of the simulations with a robot operating in a 
virtual environment. The key steps in the simulation are: 

1) Generate an initial policy from start to goal. 



2) Keep tracking innovation, large change causes robot to 
detect lost state. 

3) Switch from Gaussian to a GMM representation of the 
underlying belief. 

4) Generate information gathering actions using MSP that 
are able to localize the state. 

5) Switch back to uni-modal Gaussian and connect the 
belief to existing FIRM roadmap. 

6) Find new policy to goal. 

A. Motion Model 

We simulate a 2D ground robot, the kinematics of which 
are represented by a unicycle. 

( x/c + (Vfc + ny)5t cos Ok \ 

Yk^ {Vk^ny)6tsm0k , (5) 

Ok + {uJk + n^)6t j 

where Xk = (x/e,y/e, 6>/c)^ describes the robot state (position 
and yaw angle). Uk = (V/c,^/c)^ is the control vector consist¬ 
ing of linear velocity Vk and angular velocity ujk . We denote 
the process noise vector by Wk = ^ A/’(0, Qk)- 

B. Observation Model 

Our observation model is based on passive 
beacons/landmarks which can be pinged to measure their 
relative range and an associated ID tag. This is parallel to 
a physical implementation using a monocular camera and 
passive Augmented Reality (AR) beacons that can be detected 
by the camera. We use the visual AR model to simplify the 
data association problem. 

Let the location of the i-th landmark be denoted by *L. 
The displacement vector M from the robot to *L is given by 
M = \^dx^'^dyY •= — p, where p = [x,y]^ is the position 

of the robot. Therefore, the observation '^z of the i-th landmark 
can be modeled as, 

'^z = ^h{x^ ^v) = [||M||, atan2(M^, '^dx) — 0]^ -h (6) 

The observation noise is zero-mean Gaussian such that 
A/'(0/R) where = diag((7/^||M|| + (^6>||'d|| + 

cr^)^). The quality of sensor reading decreases as the robot 
gets farther from the landmarks. The parameters r]r and r]o 
determine this dependency, and and are the bias standard 
deviations. The robot observes only those landmarks that fall 
within its sensor range rsensor- 

C. Scenario 

The environment as shown in Fig. 3 represents a warehouse 
floor where 6 corridors Cl-6 appear exactly alike. These 
corridors open into two passages which also appear alike. 
There are unique landmarks at location LI. The robot is put 
at the start location S and tasked to reach goal location G as 
shown. Using FIRM, we generate the feedback policy from 
S to G. The blue ellipse and arrow mark the region where 


the robot will be kidnapped and where it will be displaced 
to respectively First, we generate an initial feedback policy 
from S to G. When the robot is en-route to G (Fig. 4a) it 
is kidnapped and placed in corridor C2 as shown in Fig. 4b. 
A new multi-modal belief is generated which took 51.23s to 
compute. 



Fig. 3: The environment depicting obstacles (gray) and free 
space (white). The cyan disks represent the nodes in the FIRM 
roadmap. The robot is en-route to the goal before kidnapping 
and the initial plan under FIRM is demarcated by yellow. 
The blue ellipse marks the region where the robot will be 
kidnapped (not known to robot) and the final location after 
kidnapping (in C2) is marked by the blue arrow (begins inside 
the ellipse). 


(a) The robot just before kidnapping (b) Robot and multi-modal belief 
(robot is not aware of impending just after kidnapping. Each red 
kidnapping). (smaller) disk represents a belief 

mode. 

Fig. 4: Before and after kidnapping. 

We analyze the recovery from kidnapping at four different 
time steps: 

1) G: Fig. 5a shows the initial candidate policies after the 
new belief is sampled. The policy was computed in 
58.76s. 














2) t 2 '. Fig. 5a shows that on exiting the corridors, the modes 
m 3 and m 4 that originated in C3 and C4 expect to see a 
corner to their left and are rejected as this is not observed 
by the robot. A new policy is then computed which took 
12.48s. 

3) ts! In Fig. 5c we see the new candidate policy. This 
eventually leads the robot to move forward which results 
in the modes mi and me that exit Cl and C 6 to be 
rejected as the robot expects to see a comer. Once these 
two modes are rejected a new policy is computed which 
took 3.7s. 

4) ^ 4 ! Fig. 5d shows the final two remaining modes ap¬ 
proaching the unique landmark which eventually leads 
to a uni-modal belief localized to the tme state. 

Fig. 6 a shows the belief localized, once this belief is added 
to the existing FIRM roadmap, a new policy guides the robot 
to the goal (Fig. 6 b). 

D. Discussion 

It is seen that a highly intuitive behavior emerges which 
guides the robot to seek disambiguating information such that 
it can sequentially reject the incorrect hypothesis about its 
state. The open-loop control actions are regenerated every 
time a Gaussian mode is rejected or a constraint violation is 
foreseen. The time to re-plan reduces drastically as the number 
of modes reduce (0(n^)). Thus, the first few actions are the 
hardest which is to be expected as we start off with a large 
number of hypotheses. Finally, the planner is able to localize 
the robot safely. 


V. Conclusion 

In this work, we studied the problem of motion planning 
for a mobile robot when the underlying belief state is non- 
Gaussian in nature. A non-Gaussian belief can result from 
uncertain data associations or due to unknown initial condi¬ 
tions among others. Our main contribution in this work is 
a planner MSP that generates a sequentially disambiguating 
policy, which leads the belief to converge to a uni-modal 
Gaussian. We are able to show in simulation that the robot is 
able to recover from a kidnapped state and execute its task in 
environments that present multiple uncertain data associations. 
Our next step would be to demonstrate the same on a physical 
system and prove its robustness in real-life situations. 
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